Gastroesophageal reflux disease increases the risk of rheumatoid arthritis: a bidirectional two-sample Mendelian randomization study

Rheumatoid arthritis (RA) is a common autoimmune disease, and some observational studies have indicated an association between Gastroesophageal Reflux Disease (GERD) and RA. However, the causal relationship between the two remains uncertain. We used Mendelian randomization (MR) to assess the causal relationship between GERD and RA. Two-sample Mendelian randomization analysis was performed using pooled data from large-scale genome-wide association studies. In addition, we performed multivariate MR analyses to exclude confounding factors between GERD and RA, including smoking quantity, drinking frequency, BMI, depression, and education attainment. The MR results for GERD on RA suggested a causal effect of the genetic susceptibility of GERD on RA (discovery dataset, IVW, odds ratio [OR] = 1.41, 95% confidence interval [CI] 1.22–1.63, p = 2.81 × 10−6; validation dataset, IVW, OR = 1.38, 95% CI 1.23–1.55, P = 1.76 × 10−8). Multivariate MR analysis also supports this result. But the results of the reverse MR analysis did not reveal compelling evidence that RA can increase the risk of developing GERD. Our bidirectional Two-Sample Mendelian randomization analysis and multivariate MR analysis provide support for the causal effect of GERD on RA. This discovery could offer new insights for the prevention and treatment of RA.


Experimental design
Bidirectional two-sample Mendelian randomization (TSMR) analysis must adhere to three essential assumptions: (1) Instrumental variables (IVs) should be highly correlated with the exposure factor; (2) IVs must be independent of confounding variables; (3) IVs cannot affect the outcome through pathways other than the exposure factor (Fig. 1).

Data sources GWAS data for exposure and outcome
We utilized several GWAS datasets for conducting this study.The GERD data were sourced from IEU Open GWAS (https:// gwas.mrcieu.ac.uk/), encompassing a total of 602,604 individuals of European ancestry, with the identifier "ebi-a-GCST90000514" 16 .The RA discovery and validation datasets were derived from the GWAS Catalog 17 (https:// www.ebi.ac.uk/ gwas) and the FINNGEN database 18 (https:// www.finng en.fi/ en), respectively.The former dataset includes individuals of European, East Asian, African, South Asian, and Arab descent, and we specifically utilized the European data.These European data originated from Canada, the Netherlands, Sweden, the United States, the United Kingdom, France, Germany, and Spain, comprising 22,350 cases of European descent and 74,823 European descent control cases.The latter dataset primarily comes from Finland and includes 12,555 cases of European descent and 240,862 European descent control cases (Table 1).All GWAS data used in this study have previously obtained ethical approval from previous research, eliminating the need for additional ethical clearance for the current study.The authors and references can be found in the supplementary tables for further consultation.

GWAS data for multivariable MR analysis
When analyzing the causal relationship between GERD and RA, we adjusted for the effect of potential confounders on outcomes.We further performed multivariate MR analysis on smoking quantity, drinking frequency, BMI, depression, and education attainment.The datasets of smoking quantity and drinking frequency are cigarettes per day (n = 784,353) and drinks per week (n = 2,965,643) which from the Sequencing Consortium of Alcohol and Nicotine use (GSCAN) project 19 .More information on the study can be found in https:// conse rvancy.umn.edu/ handle/ 11299/ 241912.The dataset of BMI is from the Genetic Investigation of Anthropometric Traits (GIANT) consortium 20 .The consortium provides meta-analysis data for approximately 700,000 European populations (https:// pubmed.ncbi.nlm.nih.gov/ 30124 842/).The depression dataset comes from FINNGEN database 18 , which includes 43,280 cases and 329,192 control cases (https:// www.finng en.fi/ en).The education attainment GWAS aggregate statistics were derived from the Social Science Genetic Association Consortium (SSGAC), which encompasses education attainment GWAS data, the largest sample size to date, including a total of 71 discovery cohorts including 1,131,881 participants of European ancestry, measuring approximately 10 million SNP loci 21 .

IVs selection
We used single nucleotide polymorphisms (SNPs) as IVs.To avoid linkage disequilibrium, we established that SNPs associated with the exposure factor must meet the criteria of r 2 < 0.001, aggregation window = 10,000 kb, significance threshold of P < 5.0 × 10 −8 .Subsequently, we extracted SNPs significantly associated with the exposure factor from the outcome data and documented the corresponding allele variants in the supplementary materials.This documentation also includes p-values, standard errors, and effect sizes (beta).The F-statistic for each individual SNP was determined using F = beta 2 se 2 to assess the statistical strength of each IV 22 .The overall F-statistic for SNPs was calculated using F represents the proportion of variance explained by each SNP, and its calculation formula is 2 , where eaf is the effect allele frequency, N represents the sample size of the exposure data, beta denotes the effect of the SNP on the exposure, and K is the number of SNPs 23,24 .F > 10 indicates a stable association between the SNP and the phenotype, suggesting the absence of weak instrumental variable bias 25 .In addition, proxy SNPs were not utilized in this experiment.

Statistical analysis
We conducted a bidirectional TSMR analysis on the GERD and RA datasets.We aggregated and harmonized the data to ensure that the SNPs for both the exposure and outcome correspond to the same alleles.We conducted analysis using MR-Egger regression, inverse variance weighting (IVW), weighted median, and Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO).Simple mode and weighted mode were employed for supplementary result analysis.If all genetic variants meet the IVS assumptions, IVW is used to assess the consistency of the causal effect between exposure and outcome.Cochrane's Q is utilized for assessing heterogeneity.If the p > 0.05, we opt for a fixed-effects model; otherwise, a random-effects model is applied.The difference between the intercept of the MR-Egger regression and 0 is employed for an initial evaluation of potential horizontal pleiotropy in IVs.If the p-value of its intercept is greater than 0.05, it suggests the absence of horizontal pleiotropy 26 .Sensitivity analysis was conducted using MR-PRESSO test.The MR-PRESSO NbDistribution was set at 3000, and the SignifThreshold = 0.05.MR-PRESSO can also reevaluate pleiotropy by excluding palindromic sequences 27 .If instrumental variables (IVs) show no horizontal pleiotropy, the IVW method is considered the most reliable 28 .In this case, IVW will serve as the primary approach to evaluate the causal effect between GERD and RA.The leave-one-out method is used to determine whether there are individual SNPs that significantly affect the estimation of the causal relationship.Given that there is a causal relationship between GERD and RA in univariate analysis, we further performed multivariate MR analysis to correct for the interference of confounding factors in their causal relationship.All analyses were conducted using RStudio 4.2.1 (https:// www.r-proje ct.org/).The R packages "TwoSam-pleMR", "Mendelian randomization", "MVMR", and "MR-PRESSO" were used for MR analysis.Finally, we utilized a web-based tool (https:// shiny.cnsge nomics.com/ mRnd/) to calculate the MR statistical power 29 .Statistical power is the probability of correctly rejecting the null hypothesis in a hypothesis test, meaning it reflects the ability to detect an actual effect.An appropriate level of statistical power (usually set at 80% or 0.80) ensures that the study is sensitive enough to detect actual effects, reducing the risk of committing a Type II error 30 .

Ethics approval and consent to participate
Not applicable since the study is based on summary-level data.In all original studies, ethical approval, and consent to participate had been obtained.

Univariable MR analysis
We conducted a MR analysis with GERD as the exposure and RA as the outcome.After screening 80 SNPs in GERD, and harmonizing with RA (discovery), a total of 80 SNPs were selected, including 14 palindrome sequences, 3 outliers (rs4713692, rs6722661, rs7942368) and 1 SNPs (rs3828917) strongly associated with the outcome (This SNP was associated with exposure with a P value of 2.27 × 10 −8 but an outcome P value of 6.87 × 10 −18 , violating the three major Mendelian assumptions of randomization, so we manually excluded it.For details, refer to Supplementary Materials 1-2).After harmonization with RA (validation), 78 SNPs were selected, including 3 palindrome sequences,1 outlier (rs4713692) and 1 SNPs (rs3828917) strongly associated with the outcome.(This SNP was associated with exposure with a P value of 2.27 × 10 −8 but an outcome P value of 6.19 × 10 -10 , so we also manually excluded it.For details, refer to Supplementary Materials 3-4).After removing these palindrome sequences and SNPs strongly associated with the outcome, we ended up with 62 and 73 SNPs as IVs for the discovery and validation sets, respectively.The F-statistic for all individual SNPs in both subsets were greater than 10.The total R 2 for the discovery set was 2.82%, with a total F-statistic of 282.43.For the validation set, the total R 2 was 3.29%, with a total F-statistic of 280.41.The outlier test results from MR-PRESSO indicate that there are 4 outliers in the discovery set (rs4713692, rs6722661, rs7942368, rs3828917) and 2 outliers in the validation set (rs3828917, rs4713692).These outliers have already been included in the previous set of palindrome sequences and SNPs strongly associated with the outcome, and therefore have been excluded accordingly.All outliers can be found in Supplementary Table 9.
Next, we conducted a reverse MR analysis with RA as the exposure and GERD as the outcome.Using the previously applied screening criteria, we identified 76 SNPs in RA (discovery), and after harmonizing with GERD, we selected 28 SNPs, including 1 palindrome sequence (For details, refer to Supplementary Materials 5-6).In RA (validation), 24 SNPs were identified, and after harmonization with GERD, 8 SNPs were selected, with no palindrome sequences found (For details, refer to Supplementary Materials 7-8).MR-PRESSO's outlier test results revealed 2 outliers in the discovery set (rs2561477, rs4239702) and 2 outliers in the validation set (rs11758148, rs6065926).After removing palindrome sequences and outliers, we proceeded with the subsequent MR analysis.Ultimately, we obtained 25 and 6 SNPs as IVs for the discovery and validation sets, respectively.The F-statistic for all individual SNPs in both subsets were greater than 10.The R 2 for the discovery set was 15.4%, with a total F-statistic of 710.01.For the validation set, the R 2 was 4.90%, with a total F-statistic of 2187.70.
The results of the reverse analysis indicate a causal relationship between RA (discovery) and an increased risk of developing GERD (IVW, OR = 1.03, 95% CI 1.00-1.06,P = 0.012).However, in the validation set, RA does not show a causal relationship with GERD (IVW, OR = 1.02, 95% CI 0.98-1.06,P = 0.235).The results from MR-Egger, weighted median, and MR-PRESSO in both the discovery and validation sets also support their respective conclusions.

Sensitivity analysis
We conducted heterogeneity analysis using Cochran's Q test for IVs.In the forward MR analysis, we observed heterogeneity in the IVs of the discovery set.Therefore, we employed a random-effects IVW model to assess the causal relationship, and the Egger intercept showed no significant difference from zero.The leave-one-out analysis did not identify any single SNP significantly influencing the causal estimate (discovery, IVW Cochran's Q = 85.57  2).The results from MR-PRESSO support a causal relationship between the genetic susceptibility to GERD and RA.However, the findings differ between the discovery and validation sets regarding whether RA can increase the risk of developing GERD.The scatter plots for both analyses are illustrated in Fig. 3, 4, and the funnel plot and leave-one-out plot can be found in the supplementary materials (supplementary Figures 1-4).

3Multivariable MR estimates
For the causal effect of GERD on RA, we performed a multivariate MR analysis to correct for possible confounding factors brought about by smoking quantity, drinking frequency, BMI, depression, and education attainment on RA discovery and validation datasets.After adjusting for smoking quantity (IVW, OR = 1.32, 95%CI 1.01-1.72,P = 0.044), drinking frequency (IVW, OR = 1.39, 95%CI:1.09-1.77,P = 0.008), BMI (IVW, OR = 1.22,In sensitivity analyses, MR-Egger regression for multivariate MR analysis did not detect horizontal pleiotropy but there was some degree of heterogeneity, so we used the IVW random-effects method for analysis.Detailed results can be found in the Fig. 5, Table 3 and supplementary Table 11-12.In summary, our results suggest that genetic predisposition to GERD increases the risk of RA, while RA does not have a causal relationship on GERD.This conclusion is supported in different models, and the conclusion that GERD increases the risk of RA remains validated after adjusting for smoking quantity, drinking frequency, BMI, education attainment, and depression.

Discussion
This study employed a bidirectional TSMR approach on publicly available GWAS data.We demonstrated a positive causal effect of GERD on RA in the European population using a discovery set.The validation set was subsequently used to confirm our findings.The results from the reverse MR analysis indicated a causal effect of RA on GERD was not observed.
Although the availability of new anti-rheumatic drugs has dramatically improved the prognosis and the mortality of RA patients has been declining over the past three decades, the prevalence of the disease has gradually increased 31 .RA is challenging to cure and typically requires lifelong treatment.The damage caused by the  www.nature.com/scientificreports/disease and the economic burden of treatment continue to be significant global challenges in managing RA 32 .The exact mechanisms underlying the onset of RA remain unclear.Research suggests that aside from genetic factors, environmental elements such as malnutrition, low educational attainment, smoking, occupational exposure to silica, periodontitis, and the microbiome could be associated with the development of RA [33][34][35][36][37] .
Smoking is one of the important factors leading to RA, which not only increases the body's oxidative stress response, but also promotes systemic inflammation and interferes with apoptosis 38 .One meta-analysis reported a 26% increased risk of RA even if the smoker was light (1-10 packs/year) compared with those who had never smoked (RR = 1.26, 95%CI 1.14-1.39) 39.Low and moderate alcohol consumption has been shown to reduce the risk of RA 40,41 , and one meta-analysis has demonstrated a negative association between alcohol intake and the risk of ACPA-positive RA 42 .It has been reported that there may be a synergistic effect between alcohol and smoking in influencing the risk of developing RA, and that alcohol consumption may lead to a decreased correlation between smoking and the onset of RA 43 .Obesity is considered a risk factor for human health, including but not limited to RA. Results from one meta-analysis suggest that an increase in BMI may be associated with an increased risk of RA (RR = 1.23, 95%CI 1.09-1.39) 14.Depression is one of the common comorbidities in patients with RA, and epidemiological studies have shown that depression is one of the risk factors for RA.In a study from Taiwan, the risk of RA was significantly higher in depressed people than in non-depressed people (HR = 1.65, 95% CI 1.41-1.77) 44.Another study of patients in the UK found that the risk of RA increased by 38% in depressed people compared with non-depressed people after adjusting for age, sex, smoking, BMI, comorbidities, and antidepressant use (HR = 1.38, 95% CI 1.31-1.46) 45.Previous MR studies have also shown that higher educational attainment has a protective effect on RA (OR = 0.37, 95% CI 0.31-0.44) 12.After performing multivariate MR analyses and correcting for these confounders, we found that genetic susceptibility to GERD was still causally associated with RA.
However, explaining the causal effect of genetic susceptibility to GERD on RA can be challenging (Fig. 6).One of the hypotheses in the etiology of RA is the "mucosal origin," suggesting that the autoimmune response leading to the development of RA is triggered within the relevant lymphoid tissues in the mucosa of the lungs, oral cavity, and gastrointestinal tract 46 .Factors such as the absorption of toxic substances in the intestines, disruption of gastrointestinal anatomical structures, and alterations in the microbiota can contribute to the formation of synovitis 34 .Studies have indicated that RA-associated autoantibodies can be produced in the pulmonary mucosa and lymph nodes.Local enrichment of anti-citrullinated protein antibodies (ACPA) has been detected in the sputum of early untreated RA patients 47 .Periodontitis and oral microbiota, with representatives like Porphyromonas gingivalis, can also contribute to the development of RA 48,49 .GERD, as a form of chronic esophageal damage, is increasingly being considered as a potential contributor to the development of RA, possibly even in its early stages.breaking through the mucosal barrier and potentially triggering an immune response.In normal conditions, the esophageal microbiota is primarily composed of Gram-positive bacteria.However, in GERD patients, there is a shift towards an increased presence of Gram-negative bacteria.This includes genera like Prevotella, Haemophilus, Neisseria, Campylobacter, and Clostridium 50,51 .With the increasing proportion of Gram-negative bacteria, there is also an elevation in lipopolysaccharide content.This can subsequently lead to an upregulation of gene expression through the Toll-like receptor 4 and NFκB pathways, resulting in an augmentation of pro-inflammatory cytokine expression 52 .Recent germ-free experiments have substantiated how individual microbial communities' impact specific immune cell populations, altering the balance between pro-inflammatory cells and regulatory T cells both at mucosal sites and within the bloodstream 53 .Following dysbiosis of the gut microbiota, segmented filamentous bacteria can activate Th17 cells within the lamina propria, leading to a reduced proportion of antiinflammatory Tregs, exacerbating systemic inflammatory responses, fostering an autoimmune predisposition, and ultimately precipitating arthritis [54][55][56] .Research conducted by Jose U. Scher indicates a strong association between the presence of Prevotella copri in the gut and newly diagnosed, untreated RA 57 .This study also identifies the potential role of this bacterium in the pathogenesis of RA.Coincidentally, GERD patients also exhibit the presence of Prevotella copri in the distal esophagus.This bacterium in the distal esophagus may also contribute to the development of RA.A human model study focusing on conditions such as Whipple's disease, which aligns with the gut-joint axis hypothesis, has indicated that Tropheryma whipplei can trigger the occurrence of RA in susceptible individuals 58 .It has been reported that this disease not only affects the small intestine but can also involve the esophagus, pharynx, duodenum, colon, and other areas 59 .Furthermore, chronic mucosal inflammation appears to be a significant mechanism in the pathogenesis of RA.Reports suggest TLRs expressed in the esophageal mucosa mediate the interaction between the immune system and the microbiota, which could also be a mechanism driving chronic inflammatory responses 60 .
In addition, non-steroidal anti-inflammatory drugs (NSAIDs) and corticosteroids taken during treatment in RA patients are one of the causes of gastrointestinal complications 61 .In addition, antirheumatic drugs can sometimes cause adverse gastrointestinal events 62 .As a result, the conclusion of some observational studies that there is a link between RA and GERD has been influenced.However, our findings using MR analysis suggest that there is no causal relationship between RA and GERD.
It should be emphasized that RA is a complex autoimmune disease characterized by the interaction of multiple causative factors in its pathogenesis.Therefore, there is a need for more comprehensive and detailed investigation into the mechanisms through which GERD may contribute to the development of RA.
In the reverse MR analysis, we observed that the results from the discovery set indicated an increased risk of GERD associated with RA, while the results from the validation set were entirely opposite.We speculate that this discrepancy might be due to a small overlap in samples between the RA (discovery) and GERD datasets, leading to such outcomes.Since there is no sample overlap between RA (validation) and GERD, such results would not be expected.Theoretically, there is also the possibility of false negatives due to the small size of the validation dataset, but we lean towards the first possibility.
This study still has some limitations.First, due to weak instrumental bias (F-statistic < 10), the causal effect of GERD and RA, which corrected by BMI and depression should be interpreted with caution.In addition, different databases may contribute to the presence of heterogeneity.However, the use of the IVW random-effects method and the absence of horizontal pleiotropy indicates that our results are unlikely to be interfered with by heterogeneity 62 .Secondly, our GWAS data are derived from European populations, which may limit the generalizability of the MR results when extrapolated to other populations.we did not perform a stratified analysis based on serum ACPA positivity and negativity or gender.Further research in the future could explore this aspect in more detail.Lastly, the credibility of IVs to a certain extent is influenced by the sample size of the GWAS.In the future, larger-scale GWAS data will be necessary to validate the conclusions.

Conclusion
In summary, our analysis supports a causal relationship between genetic susceptibility to GERD and an increased risk of RA.This finding is crucial for deepening our understanding of the pathogenesis of RA and may offer new insights for the prevention and treatment of RA.It also offers a new perspective on preventing the occurrence of GERD in patients with RA.However, based on the results of the reverse MR analysis using the existing dataset, compelling evidence was not found for RA increasing the risk of developing GERD.

Figure 1 .
Figure 1.MR analysis workflow and MR assumption.A concise experimental flowchart and the three assumptions that Mendelian randomization analysis needs to fulfill.

Figure 2 .
Figure 2. Casual estimate and sensitivity analysis of univariate MR analysis.Causal estimate and sensitivity analysis of MR analysis.Number of SNPs; OR odds ratio; GERD, Gastroesophageal reflux disease; RA, rheumatoid arthritis.

Figure 3 .
Figure 3. Scatterplot for three methods (GERD on RA).Scatter plots results of MR analysis of GERD as exposure.On the left is GERD on RA discovery, on the right is GERD on RA validation.GERD, Gastroesophageal reflux disease; RA, rheumatoid arthritis.

Figure 4 .
Figure 4. Scatterplot for three methods (RA on GERD).Scatter plots results of MR analysis of RA as exposure.On the left is RA discovery on GERD, on the right is RA validation on GERD.

Figure 6 .
Figure 6.Possible mechanisms leading to RA. Figure illustrates some of the mechanisms underlying mucosal involvement in rheumatoid arthritis, as well as certain gut-joint axis mechanisms mentioned in the text.

Table 2 .
Heterogeneity results of MR analysis.Total of F-statistic and R 2 .